Goto

Collaborating Authors

 epsilon -greedy exploration


On the Convergence and Sample Complexity Analysis of Deep Q-Networks with \epsilon -Greedy Exploration

Neural Information Processing Systems

This paper provides a theoretical understanding of deep Q-Network (DQN) with the $\varepsilon$-greedy exploration in deep reinforcement learning.Despite the tremendous empirical achievement of the DQN, its theoretical characterization remains underexplored.First, the exploration strategy is either impractical or ignored in the existing analysis. Second, in contrast to conventional Q-learning algorithms, the DQN employs the target network and experience replay to acquire an unbiased estimation of the mean-square Bellman error (MSBE) utilized in training the Q-network. However,the existing theoretical analysis of DQNs lacks convergence analysis or bypasses the technical challenges by deploying a significantly overparameterized neural network, which is not computationally efficient. This paper provides the first theoretical convergence and sample complexity analysis of the practical setting of DQNs with $\epsilon$-greedy policy. We prove an iterative procedure with decaying $\epsilon$ converges to the optimal Q-value function geometrically. Moreover, a higher level of $\epsilon$ values enlarges the region of convergence but slows down the convergence, while the opposite holds for a lower level of $\epsilon$ values. Experiments justify our established theoretical insights on DQNs.


Understanding Deep Neural Function Approximation in Reinforcement Learning via \epsilon -Greedy Exploration

Neural Information Processing Systems

This paper provides a theoretical study of deep neural function approximation in reinforcement learning (RL) with the $\epsilon$-greedy exploration under the online setting. This problem setting is motivated by the successful deep Q-networks (DQN) framework that falls in this regime. In this work, we provide an initial attempt on theoretical understanding deep RL from the perspective of function class and neural networks architectures (e.g., width and depth) beyond the ``linear'' regime. To be specific, we focus on the value based algorithm with the $\epsilon$-greedy exploration via deep (and two-layer) neural networks endowed by Besov (and Barron) function spaces, respectively, which aims at approximating an $\alpha$-smooth Q-function in a $d$-dimensional feature space.


On the Convergence and Sample Complexity Analysis of Deep Q-Networks with \epsilon -Greedy Exploration

Neural Information Processing Systems

This paper provides a theoretical understanding of deep Q-Network (DQN) with the \varepsilon -greedy exploration in deep reinforcement learning.Despite the tremendous empirical achievement of the DQN, its theoretical characterization remains underexplored.First, the exploration strategy is either impractical or ignored in the existing analysis. Second, in contrast to conventional Q-learning algorithms, the DQN employs the target network and experience replay to acquire an unbiased estimation of the mean-square Bellman error (MSBE) utilized in training the Q-network. However,the existing theoretical analysis of DQNs lacks convergence analysis or bypasses the technical challenges by deploying a significantly overparameterized neural network, which is not computationally efficient. This paper provides the first theoretical convergence and sample complexity analysis of the practical setting of DQNs with \epsilon -greedy policy. We prove an iterative procedure with decaying \epsilon converges to the optimal Q-value function geometrically. Moreover, a higher level of \epsilon values enlarges the region of convergence but slows down the convergence, while the opposite holds for a lower level of \epsilon values.


Understanding Deep Neural Function Approximation in Reinforcement Learning via \epsilon -Greedy Exploration

Neural Information Processing Systems

This paper provides a theoretical study of deep neural function approximation in reinforcement learning (RL) with the \epsilon -greedy exploration under the online setting. This problem setting is motivated by the successful deep Q-networks (DQN) framework that falls in this regime. In this work, we provide an initial attempt on theoretical understanding deep RL from the perspective of function class and neural networks architectures (e.g., width and depth) beyond the linear'' regime. To be specific, we focus on the value based algorithm with the \epsilon -greedy exploration via deep (and two-layer) neural networks endowed by Besov (and Barron) function spaces, respectively, which aims at approximating an \alpha -smooth Q-function in a d -dimensional feature space. Moreover, for a two layer neural network endowed by the Barron space, scaling the width \Omega(\sqrt{T}) is sufficient.